2,151 research outputs found

    dotCall64: An Efficient Interface to Compiled C/C++ and Fortran Code Supporting Long Vectors

    Full text link
    The R functions .C() and .Fortran() can be used to call compiled C/C++ and Fortran code from R. This so-called foreign function interface is convenient, since it does not require any interactions with the C API of R. However, it does not support long vectors (i.e., vectors of more than 2^31 elements). To overcome this limitation, the R package dotCall64 provides .C64(), which can be used to call compiled C/C++ and Fortran functions. It transparently supports long vectors and does the necessary castings to pass numeric R vectors to 64-bit integer arguments of the compiled code. Moreover, .C64() features a mechanism to avoid unnecessary copies of function arguments, making it efficient in terms of speed and memory usage.Comment: 17 page

    gsbDesign: An R Package for Evaluating the Operating Characteristics of a Group Sequential Bayesian Design

    Get PDF
    The R package gsbDesign provides functions to evaluate the operating characteristics of Bayesian group sequential clinical trial designs. More specifically, we consider clinical trials with interim analyses, which compare a treatment with a control, and where the endpoint is normally distributed. Prior information can either be specified for the difference of treatment and control, or separately for the effects in the treatment and the control groups. At each interim analysis, the decision to stop or continue the trial is based on the posterior distribution of the difference between treatment and control. The decision at the final analysis is also based on this posterior distribution. Multiple success and/or futility criteria can be specified to reflect adequately medical decision-making. We describe methods to evaluate the operating characteristics of such designs for scenarios corresponding to different true treatment and control effects. The characteristics of main interest are the probabilities of success and futility at each interim analysis, and the expected sample size. We illustrate the use of gsbDesign with a detailed case study

    Statistical tools to model space-time data with a focus on biodiversity applications

    Full text link
    Statistische Modelle sind wichtige Hilfsmittel um Raum-Zeit-Daten wie Satellitenbilder und ökologische Feldmessungen zu analysieren und interpretieren. Dabei verunmöglichen komplexe Datenstrukturen und immer grössere Datenmengen den Gebrauch von herkömmlichen geostatistischen Methoden wie Kriging. Diese Unzulänglichkeit eröffnet das aktive und attraktive Forschungsgebiet der angewandten Raum-Zeit-Statistik für grosse Daten. Die in dieser Arbeit präsentierten Fortschritte auf diesem Gebiet sind hauptsächlich durch ökologische Fragestellungen betreffend die arktische Vegetation und deren Anpassungen an die globale Klimaerwärmung motiviert. Quantitative Aussagen über die arktische Vegetation beruhen hauptsächlich auf zwei fundamental verschiedenen Arten von Messungen: Die eine Art besteht aus Feldmessungen von biologisch relevanten Parametern, die andere stützt sich auf Fernerkundungsdaten und die daraus abgeleiteten Vegetationsindizes. Beide Ansätze führen zu Raum-Zeit-Daten und bringen verschiedene Probleme mit sich, welche gültige Aussagen für die ganze Arktis erschweren. Zum Beispiel gibt es relativ wenige Orte mit Feldmessungen und die Fernerkundungsdaten sind häufig beeinträchtigt durch mit Wolken, Schnee und Wasser bedeckte Landschaften. Diese Doktorarbeit präsentiert eine Reihe von statistischen und rechnerischen Entwicklungen, welche helfen die Aussagen zur Vegetation der Arktis zu präzisieren. Die Arbeit ist in fünf Manuskripte aufgeteilt: Paper I behandelt den 64-bit Ausbau der R Erweiterung spam, welche neu dünnbesetzte Matrizen mit mehr als 2 31 von Null verschiede Einträgen manipulieren kann. Besagter Ausbau ermöglichte grosse fernerkundungsbasierte Vegetationsindex Daten mit einem nicht stationären Gauss-Prozess zu modellieren. Die 64-bit Erweiterung basiert auf der R Erweiterung dotCall64, welche in Paper II detailliert diskutiert wird. Ferner beschreibt Paper III eine neue Methode um fehlende Werte in raum-zeitlichen Fernerkundungsdaten zu berechnen. Dabei berechnet die Methode jeden fehlenden Wert einzeln. Sie sucht eine geeignete Raum-Zeit-Teilmenge der Daten und wendet Sortieralgorithmen für Bilder sowie Quantilsregression an. Um auch sehr grosse Daten mit leistungsstarken Rechnern bearbeiten zu können verfügt die dazugehörige R Erweiterung gapfill über ein modulares Design mit Möglichkeiten zur parallelen Datenverarbeitung. Paper IV behandelt verschiedene Umsetzungs- und Validationsstrategien von bayesschen hierarchischen Modellen für Zähldaten. Wie in der Einleitung dieser Arbeit skizziert sind Fortschritte auf diesem Gebiet vielversprechend um Daten von verschiedenen Quellen, zum Beispiel Daten zum Vorkommen von Pflanzenarten und Vegetationsindex Daten, gemeinsam zu modellieren. Schliesslich stellt Paper V eine Fallstudie vor, welche arktische Feldmessungen der Biodiversität mit einer fernerkundungsbasierten Landschaftscharakterisierung verbindet. Genauer werden die Abhängigkeiten zwischen Biodiversitätsindizes basierend auf Daten des Arctic Vegetation Archive und Landschaftscharakterisierungen mit Vegetationsindex Daten und einem Höhenmodell untersucht. Statistical models are important means to analyze and interpret space-time data, such as satellite datasets and ecological field measurements. However, complex data structures and increasing dataset sizes make it impossible to use standard geostatistical methods like kriging. The resulting methodological gap opens up an active and attractive research area, namely the one of applied spatio-temporal statistics for large datasets. The herein presented advances in that field are mainly motivated by ecological research questions centered around the Arctic vegetation and its response to global warming. Quantitative statements about the Arctic vegetation are typically based on two fundamentally different types of measurements: field measurements of biologically relevant parameters on the one hand and remotely sensed vegetation indices on the other. Both techniques lead to spatio-temporal data and face various challenges, which make it difficult to characterize vegetation at Pan-Arctic scale. For instance, the spatial sparsity of field measurements and the fact that satellite observations are often confounded by cloud, snow, and water covered surfaces are major drawbacks. This PhD thesis presents a series of statistical and computational developments, which help to improve the quality of quantitative statements about the Arctic vegetation. The thesis is structured into five self-contained paper manuscripts: Paper I is concerned with making the sparse matrix algebra R package spam capable of handling large 64-bit matrices with 2 31 and more non-zero elements. This enabled fitting a non-stationary spatial Gaussian process model to a large remote sensing based vegetation index dataset. The 64-bit extension is based on the R package dotCall64, which is discussed in detail in Paper II. Paper III introduces a new spatio-temporal prediction method for missing values in satellite data. The method predicts each missing value separately by selecting a suitable spatio-temporal subset followed by an image sorting procedure and quantile regression. To be able to process massive amounts of data with large computer systems the corresponding R package gapfill features a modular design with an emphasis on parallel computing. Paper IV elaborates on different implementation and validation strategies for spatial Bayesian hierarchical models for count data. As sketched in the introduction of the thesis, advances in that direction are promising to jointly model data from various sources, such as Arctic plant abundance data and remotely sensed vegetation indices. Eventually, Paper V presents a case-study, in which Arctic plot scale biodiversity measurements are related to remote sensing based landscape characterizations. More precisely, relations between biodiversity indices derived from field measurements of the Arctic Vegetation Archive and landscape characterizations based on vegetation index data as well as a digital elevation model are explored

    AspectCSE: Sentence Embeddings for Aspect-based Semantic Textual Similarity using Contrastive Learning and Structured Knowledge

    Full text link
    Generic sentence embeddings provide a coarse-grained approximation of semantic textual similarity but ignore specific aspects that make texts similar. Conversely, aspect-based sentence embeddings provide similarities between texts based on certain predefined aspects. Thus, similarity predictions of texts are more targeted to specific requirements and more easily explainable. In this paper, we present AspectCSE, an approach for aspect-based contrastive learning of sentence embeddings. Results indicate that AspectCSE achieves an average improvement of 3.97% on information retrieval tasks across multiple aspects compared to the previous best results. We also propose using Wikidata knowledge graph properties to train models of multi-aspect sentence embeddings in which multiple specific aspects are simultaneously considered during similarity predictions. We demonstrate that multi-aspect embeddings outperform single-aspect embeddings on aspect-specific information retrieval tasks. Finally, we examine the aspect-based sentence embedding space and demonstrate that embeddings of semantically similar aspect labels are often close, even without explicit similarity training between different aspect labels.Comment: Accepted to the 14th International Conference on Recent Advances in Natural Language Processing (RANLP 2023

    Mobile Crowd Location Prediction with Hybrid Features using Ensemble Learning

    Get PDF
    With the explosive growth of location-based service on mobile devices, predicting users’ future locations and trajectories is of increasing importance to support proactive information services. In this paper, we model this problem as a supervised learning task and propose to use ensemble learning methods with hybrid features to solve it. We characterize the properties of users’ visited locations and movement patterns and then extract feature types (temporal, spatial, and system) to quantify the correlation between locations and features. Finally, we apply ensemble methods to predict users’ future locations with extracted features. Moreover, we design an adaptive Markov Chain model to predict users’ trajectories between two locations. To evaluate the system performance, we use a real-life dataset from the Nokia Mobile Data Challenge. Experiment results unveil interesting findings: (1) For individual predictors, Bayes Networks outperform all others when data quality is good, while J48 delivers the best results when data quality is bad; (2) Ensemble predictors outperform individual predictors in general under all conditions; and (3) Ensemble predictor performance depends on the user movement patterns

    Mobile Crowd Location Prediction with Hybrid Features using Ensemble Learning

    Get PDF
    With the explosive growth of location-based service on mobile devices, predicting users’ future locations and trajectories is of increasing importance to support proactive information services. In this paper, we model this problem as a supervised learning task and propose to use ensemble learning methods with hybrid features to solve it. We characterize the properties of users’ visited locations and movement patterns and then extract feature types (temporal, spatial, and system) to quantify the correlation between locations and features. Finally, we apply ensemble methods to predict users’ future locations with extracted features. Moreover, we design an adaptive Markov Chain model to predict users’ trajectories between two locations. To evaluate the system performance, we use a real-life dataset from the Nokia Mobile Data Challenge. Experiment results unveil interesting findings: (1) For individual predictors, Bayes Networks outperform all others when data quality is good, while J48 delivers the best results when data quality is bad; (2) Ensemble predictors outperform individual predictors in general under all conditions; and (3) Ensemble predictor performance depends on the user movement patterns

    Effect of Abduction Brace Wearing Compliance on the Results of Arthroscopic Rotator Cuff Repair

    Full text link
    Background: The benefit of protective bracing after rotator cuff reconstruction has been debated for many years, although immobilization compliance has never been assessed objectively to date. In a previous study, compliance with the wearing of an abduction brace was measured for the first time with use of temperature-sensitive sensors. The purpose of the present follow-up study was to assess the effect of immobilization compliance on tendon-healing after rotator cuff repair. Methods: The clinical and radiographic outcomes for 46 consecutive patients with objectively assessed abduction brace wearing compliance after arthroscopic repair of a superior rotator cuff tear were prospectively analyzed. Rotator cuff integrity was examined with ultrasound. Clinical outcomes were assessed with the relative Constant-Murley score (RCS), the Subjective Shoulder Value (SSV), and pain and patient satisfaction ratings. Receiver operating characteristic (ROC) curves were used to determine the optimal cutoff value of abduction brace compliance for discriminating between shoulders that will and will not have a retear and the association of compliance with the failure of rotator cuff repair. Results: After a mean duration of follow-up of 20 ± 9 months, the odds ratio for having a rotator cuff repair failure was 13-fold higher for patients with a compliance rate of <60% (p = 0.037). The retear rate was 3% (1 of 35 patients) in the high-compliance cohort (≥60% compliance) and 27% (3 of 11) in the low-compliance cohort (<60% compliance) (p = 0.037). No differences in RCS, SSV, pain, or postoperative patient satisfaction were observed between patients with ≥60% compliance and those with <60% compliance. Conclusions: Patients with a compliance rate of <60% had a 13-fold increase in the risk of rotator cuff retear. The 2 patients with the lowest compliance rates (11% and 22%) both had retears. Due to the small sample size, no final conclusions can be drawn regarding the influence of immobilization compliance on tendon-healing after rotator cuff repair. These findings justify a prospective trial with a larger cohort to confirm or disprove the value of compliance with abduction bracing. Level of Evidence: Therapeutic Level II. See Instructions for Authors for a complete description of levels of evidence
    • …
    corecore